Improving Classification with Cost-sensitive Approach for Distributed Databases

نویسندگان

  • Maria Muntean
  • Honoriu Vălean
  • Ioan Ileană
  • Corina Rotar
چکیده

A problem arises in data mining, when classifying unbalanced datasets using Support Vector Machines. Because of the uneven distribution and the soft margin of the classifier, the algorithm tries to improve the general accuracy of classifying a dataset, and in this process it might misclassify a lot of weakly represented classes, confusing their class instances as overshoot values that appear in the dataset, and thus ignoring them. This paper introduces the Enhancer, a new algorithm that improves the Cost-sensitive classification for Support Vector Machines, by multiplying in the training step the instances of the underrepresented classes. We have discovered that by oversampling the instances of the class of interest, we are helping the Support Vector Machine algorithm to overcome the soft margin. As an effect, it classifies better future instances of this class of interest. Experimentally we have found out that our algorithm performs well on distributed databases.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Ensemble Classification and Extended Feature Selection for Credit Card Fraud Detection

Due to the rise of technology, the possibility of fraud in different areas such as banking has been increased. Credit card fraud is a crucial problem in banking and its danger is over increasing. This paper proposes an advanced data mining method, considering both feature selection and decision cost for accuracy enhancement of credit card fraud detection. After selecting the best and most effec...

متن کامل

A New Formulation for Cost-Sensitive Two Group Support Vector Machine with Multiple Error Rate

Support vector machine (SVM) is a popular classification technique which classifies data using a max-margin separator hyperplane. The normal vector and bias of the mentioned hyperplane is determined by solving a quadratic model implies that SVM training confronts by an optimization problem. Among of the extensions of SVM, cost-sensitive scheme refers to a model with multiple costs which conside...

متن کامل

A New Framework for Distributed Multivariate Feature Selection

Feature selection is considered as an important issue in classification domain. Selecting a good feature through maximum relevance criterion to class label and minimum redundancy among features affect improving the classification accuracy. However, most current feature selection algorithms just work with the centralized methods. In this paper, we suggest a distributed version of the mRMR featu...

متن کامل

Proposing a Novel Cost Sensitive Imbalanced Classification Method based on Hybrid of New Fuzzy Cost Assigning Approaches, Fuzzy Clustering and Evolutionary Algorithms

In this paper, a new hybrid methodology is introduced to design a cost-sensitive fuzzy rule-based classification system. A novel cost metric is proposed based on the combination of three different concepts: Entropy, Gini index and DKM criterion. In order to calculate the effective cost of patterns, a hybrid of fuzzy c-means clustering and particle swarm optimization algorithm is utilized. This ...

متن کامل

A New Approach to Software Cost Estimation by Improving Genetic Algorithm with Bat Algorithm

Because of the low accuracy of estimation and uncertainty of the techniques used in the past to Software Cost Estimation (SCE), software producers face a high risk in practice with regards to software projects and they often fail in such projects. Thus, SCE as a complex issue in software engineering requires new solutions, and researchers make an effort to make use of Meta-heuristic algorithms ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011